# Instructions for Data and Code Replication

James Cross, Derek Greene, Stefan Müller, and Martijn Schoonvelde
(2025). "Mapping Digital Campaign Strategies: How Political Candidates
Use Social Media to Communicate Constituency Connection and Policy
Stance." *Computational Communication Research.*

## Working directory and folder structure

Set your working directory to the location of the replication files, or
use a project/folder structure in VS Code (or a similar IDE) so that all
scripts can locate the required input and output files without path
errors.

## Overview of replication steps

The replication materials involve three main steps:

1.  **Fine-tuning and comparing machine-learning classifiers**

    -   Implemented in Python.

    -   Input data (tweets) cannot be shared due to Twitter/X terms and
        conditions.

    -   All Python scripts are provided under an Apache 2.0 licence.

    -   Users can run the scripts with their own equivalent dataset to
        reproduce the results.

    -   More details on all scripts are provided on the next page.

2.  **Data merging and cleaning**

    -   Implemented in R script *01_merge_and_clean.R.*

    -   Merges and cleans several datasets for use in descriptive plots
        and statistical models.

    -   Plots require tweet data as input, which cannot be shared.

    -   However:

        -   The cleaned dataset without text content is provided as
            a .dta file.

        -   Intermediate files are provided to enable reproduction of
            all plots and tables in the paper without direct access to
            the tweet texts.

    -   To highlight the reproducibility of the analysis (performed on
        14 August 2025), we have added a rendered html with all code and
        outputs (script *01\_ merge_and_clean.html*).

3.  **Descriptive analysis**

    -   Implemented in R script *02_analysis_descriptive.R*.

    -   Generates descriptive tables and plots used in the article.

    -   To highlight the reproducibility of the analysis (performed on
        14 August 2025), we have added a rendered html with all code and
        outputs (script *02_analysis_descriptive.html*).

This project uses renv to manage R package versions and ensure
reproducibility.

### Initial setup

1.  Open the project in R (RStudio or Positron recommended).

2.  Run:

3.  install.packages(\"renv\")

4.  renv::restore()

> This will install the exact package versions recorded
> in renv.lock into a project-specific library.

### Using packages

-   When you load the project, renv will automatically activate the
    isolated environment.

-   Use library(package_name) as usual -- packages will be loaded from
    the project's private library, not your global R library.

-   To add a new package:

-   renv::install(\"package_name\")

-   renv::snapshot()

This updates the environment and lockfile so others get the same
version.

4.  **Regression models**

    -   Implemented in Stata do file 03\_*regressions.do*.

    -   Script can be executed by loading *data_analysis.dta*.

    -   To highlight the reproducibility of the analysis (performed on
        14 August 2025), we have added a log file (see
        *03_regressions.log*)

## R Environment

The analysis was reproduced successfully using the following package
versions and operating system:

```
> sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Dublin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] quanteda.textstats_0.97.2 quanteda_4.3.1           
 [3] slider_0.3.2              zoo_1.8-14               
 [5] ggbeeswarm_0.7.2          scales_1.3.0             
 [7] haven_2.5.4               ggeffects_2.3.0          
 [9] xtable_1.8-4              texreg_1.39.4            
[11] lubridate_1.9.4           forcats_1.0.0            
[13] stringr_1.5.1             dplyr_1.1.4.9000         
[15] purrr_1.0.4               readr_2.1.5              
[17] tidyr_1.3.1               tibble_3.2.1             
[19] ggplot2_3.5.1             tidyverse_2.0.0          
[21] cowplot_1.1.3            

loaded via a namespace (and not attached):
 [1] generics_0.1.4   renv_1.1.5       stringi_1.8.7    lattice_0.22-6  
 [5] hms_1.1.3        magrittr_2.0.3   grid_4.4.2       timechange_0.3.0
 [9] Matrix_1.7-1     httr_1.4.7       stopwords_2.3    cli_3.6.5       
[13] rlang_1.1.6      munsell_0.5.1    withr_3.0.2      tools_4.4.2     
[17] tzdb_0.4.0       colorspace_2.1-1 fastmatch_1.1-6  vctrs_0.6.5     
[21] R6_2.6.1         lifecycle_1.0.4  vipor_0.4.7      insight_1.3.1   
[25] pkgconfig_2.0.3  beeswarm_0.4.0   warp_0.2.1       pillar_1.10.2   
[29] gtable_0.3.6     glue_1.8.0       Rcpp_1.0.14      tidyselect_1.2.1
[33] compiler_4.4.2   nsyllable_1.0.1 
```
## 

## Python Experiments

Python scripts correspond to the experiments reported in the paper. All
paths are relative to the working directory. Set your working directory
accordingly or use a folder/project structure in VS Code to avoid path
errors.

## Experiments -- Dataset 1 (Images + Text)

**Experiment 1: Bag-of-words text classification**\
Apply standard classifiers with a bag-of-words model:

python code/classify-bow.py data/dataset1/policy.csv -s
code/stopwords.txt -o results/dataset1/policy-bow.csv

python code/classify-bow.py data/dataset1/electioneering.csv -s
code/stopwords.txt -o results/dataset1/electioneering-bow.csv

**Experiment 2: BERT classification**\
Use pre-trained embeddings in combination with classifiers:

python code/classify-bert.py data/dataset1/policy.csv -o
results/dataset1/policy-bert-cv.csv

python code/classify-bert.py data/dataset1/electioneering.csv -o
results/dataset1/electioneering-bert.csv

**Experiment 3: Sentence-BERT classification**\
Use pre-trained sentence embeddings with multiple pre-trained models:

python code/classify-sbert.py data/dataset1/policy.csv -m
all-MiniLM-L6-v2 -o
results/dataset1/policy-sbert_sentence+minilml6v2.csv

python code/classify-sbert.py data/dataset1/policy.csv -m
all-mpnet-base-v2 -o
results/dataset1/policy-sbert_sentence+mpnetbasev2.csv

python code/classify-sbert.py data/dataset1/policy.csv -m
all-distilroberta-v1 -o
results/dataset1/policy-sbert_sentence+distilrobertav1.csv

python code/classify-sbert.py data/dataset1/electioneering.csv -m
all-MiniLM-L6-v2 -o
results/dataset1/electioneering-sbert_sentence+minilml6v2.csv

python code/classify-sbert.py data/dataset1/electioneering.csv -m
all-mpnet-base-v2 -o
results/dataset1/electioneering-sbert_sentence+mpnetbasev2.csv

python code/classify-sbert.py data/dataset1/electioneering.csv -m
all-distilroberta-v1 -o
results/dataset1/electioneering-sbert_sentence+distilrobertav1.csv

## Experiments -- Dataset 2 (Text-Only)

**Experiment 1: Bag-of-words text classification**

python code/classify-bow.py data/dataset2/policy.csv -s
code/stopwords.txt -o results/dataset2/policy-bow.csv

python code/classify-bow.py data/dataset2/electioneering.csv -s
code/stopwords.txt -o results/dataset2/electioneering-bow.csv

**Experiment 2: BERT classification**

python code/classify-bert.py data/dataset2/policy.csv -o
results/dataset2/policy-bert.csv

python code/classify-bert.py data/dataset2/electioneering.csv -o
results/dataset2/electioneering-bert.csv

**Experiment 3: Sentence-BERT classification**

python code/classify-sbert.py data/dataset2/policy.csv -m
all-MiniLM-L6-v2 -o
results/dataset2/policy-sbert_sentence+minilml6v2.csv

python code/classify-sbert.py data/dataset2/policy.csv -m
all-mpnet-base-v2 -o
results/dataset2/policy-sbert_sentence+mpnetbasev2.csv

python code/classify-sbert.py data/dataset2/policy.csv -m
all-distilroberta-v1 -o
results/dataset2/policy-sbert_sentence+distilrobertav1.csv

python code/classify-sbert.py data/dataset2/electioneering.csv -m
all-MiniLM-L6-v2 -o
results/dataset2/electioneering-sbert_sentence+minilml6v2.csv

python code/classify-sbert.py data/dataset2/electioneering.csv -m
all-mpnet-base-v2 -o
results/dataset2/electioneering-sbert_sentence+mpnetbasev2.csv

python code/classify-sbert.py data/dataset2/electioneering.csv -m
all-distilroberta-v1 -o
results/dataset2/electioneering-sbert_sentence+distilrobertav1.csv

## Experiments -- Combined Dataset

**Experiment 1: Bag-of-words text classification**

python code/classify-bow.py data/combined/policy.csv -s
code/stopwords.txt -o results/combined/policy-bow.csv

python code/classify-bow.py data/combined/electioneering.csv -s
code/stopwords.txt -o results/combined/electioneering-bow.csv

**Experiment 2: BERT classification**

python code/classify-bert.py data/combined/policy.csv -o
results/combined/policy-bert.csv

python code/classify-bert.py data/combined/electioneering.csv -o
results/combined/electioneering-bert.csv

**Experiment 3: Sentence-BERT classification**

python code/classify-sbert.py data/combined/policy.csv -m
all-MiniLM-L6-v2 -o
results/combined/policy-sbert_sentence+minilml6v2.csv

python code/classify-sbert.py data/combined/policy.csv -m
all-mpnet-base-v2 -o
results/combined/policy-sbert_sentence+mpnetbasev2.csv

python code/classify-sbert.py data/combined/policy.csv -m
all-distilroberta-v1 -o
results/combined/policy-sbert_sentence+distilrobertav1.csv

python code/classify-sbert.py data/combined/electioneering.csv -m
all-MiniLM-L6-v2 -o
results/combined/electioneering-sbert_sentence+minilml6v2.csv

python code/classify-sbert.py data/combined/electioneering.csv -m
all-mpnet-base-v2 -o
results/combined/electioneering-sbert_sentence+mpnetbasev2.csv

python code/classify-sbert.py data/combined/electioneering.csv -m
all-distilroberta-v1 -o
results/combined/electioneering-sbert_sentence+distilrobertav1.csv

## Experiments -- Classification of Unseen Data

Build the language model for the unseen data:

python code/build-sbert.py data/raw/data_tweets_all.csv \--seed=101 -m
all-distilroberta-v1 -o models/sbert_distilrobertav1.bin

Classify all unseen tweets:

python code/classify-unseen.py data/raw/data_tweets_all.csv
models/sbert_distilrobertav1.bin -t policy \--seed=101 -o
results/unseen/policy-sbert_distilrobertav1.csv

python code/classify-unseen.py data/raw/data_tweets_all.csv
models/sbert_distilrobertav1.bin -t electioneering \--seed=101 -o
results/unseen/electioneering-sbert_distilrobertav1.csv

## Experiments -- Classifier Verification

Final verification on third annotated hold-out set (400 tweets with
images + text, 400 tweets with text only):

python code/classify-final.py data/combined/policy.csv
data/dataset3/policy.csv -t policy \--seed=101 -m all-distilroberta-v1
-o results/final/policy-sbert_distilrobertav1.csv

python code/classify-final.py data/combined/electioneering.csv
data/dataset3/electioneering.csv -t electioneering \--seed=101 -m
all-distilroberta-v1 -o
results/final/electioneering-sbert_distilrobertav1.csv

# Contact

For any questions about the replication process, please contact the
authors.
